MySQL: Large VARCHAR vs. TEXT?

Multi tool use
Multi tool use


MySQL: Large VARCHAR vs. TEXT?



I've got a messages table in MySQL which records messages between users. Apart from the typical ids and message types (all integer types) I need to save the actual message text as either VARCHAR or TEXT. I'm setting a front-end limit of 3000 characters which means the messages would never be inserted into the db as longer than this.



Is there a rationale for going with either VARCHAR(3000) or TEXT? There's something about just writing VARCHAR(3000) that feels somewhat counter-intuitive. I've been through other similar posts on Stack Overflow but would be good to get views specific to this type of common message storing.





A bit old, but I came here because I ran into a problem that made me think about this. In my case my front-end form was limited to 2,000 characters but the encoding implicit in my storage method encoded international characters as multiple characters (which can apparently anywhere from 3 - 12 per character). So my 2,000 suddenly becomes up to 24,000. Something to think about...
– James S
Mar 24 '14 at 22:40





I have found text to be significantly faster for many concurrent inserts.
– Ray S.
Mar 28 '14 at 8:47





@JamesS: utf8mb4... >.<
– indivisible
Apr 20 '15 at 22:44





I have voted to close this because it is out of date. New row formats were introduced later in 2010, invalidating many of the Answers. Such answers are being quoted a gospel; it would be better to get rid of this thread than leave wrong info around.
– Rick James
Jun 22 at 22:07





@RickJames consider posting an updated answer, rather than close the question
– Yvette Colomb
Jun 24 at 2:19




6 Answers
6



TEXT and BLOB is stored off the table with the table just having a pointer to the location of the actual storage.


TEXT


BLOB



VARCHAR is stored inline with the table. VARCHAR is faster when the size is reasonable, the tradeoff of which would be faster depends upon your data and your hardware, you'd want to benchmark a realworld scenario with your data.


VARCHAR


VARCHAR



Update Whether VARCHAR or TEXT is stored inline, or off-record depends on data size, columns size, row_format, and MySQL version. It does not depend on "text" vs "varchar".


VARCHAR


TEXT





+1: VARCHAR (stored inline) is usually faster IF the data is frequently retrieved (included by most queries). However, for a large volume of data that is not normally retrieved (that is, not referenced by any query), then it may be better to not have the data stored inline. There is an upper limit on the row size, for data stored inline.
– spencer7593
Jan 14 '11 at 17:54





Can you include any source? Where have you read it? Thanks.
– santiagobasulto
Oct 4 '11 at 19:56





@Pacerier: the exact benefit of avoiding "inline" storage is an increase in the number of rows that can be stored in a block, which means the table rows occupy fewer blocks in the InnoDB buffer cache (smaller memory footprint), and means fewer blocks to be transferred to and from disk (reduced I/O). But, this is only a performance benefit if the columns stored "off row" are largely unreferenced by queries. If those "off row" columns are referenced by most queries, that benefit largely evaporates. Inline is preferred if the columns fit in the max rowsize and are frequently referenced.
– spencer7593
Jun 4 '13 at 22:35






"VARCHAR is faster when the size is reasonable". What is a "reasonable" number of characters, 100? 1000? 100,000?
– tim peterson
Sep 22 '13 at 13:56





This answer is not correct for InnoDB. Both VARCHAR and BLOB/TEXT are stored inline with other columns if the value on a given row fits in the page size (16KB and each page must hold at least two rows). If the string is too large for that, it overflows to additional pages. See mysqlperformanceblog.com/2010/02/09/blob-storage-in-innodb for a detailed explanation.
– Bill Karwin
Jan 1 '14 at 21:43



Can you predict how long the user input would be?



Case: user name, email, country, subject, password



Case: messages, emails, comments, formatted text, html, code, images, links



Case: large json bodies, short to medium length books, csv strings



Case: textbooks, programs, years of logs files, harry potter and the goblet of fire, scientific research logging





Predictability is really a side item here. It's actually maximum expected length that should be the deciding factor. The items you mention as more predictable are only that way because they are shorter than the others.
– Andrew Barber
Nov 1 '12 at 19:46





@andrew-barber That's my point though. All the other posts explain well about the differences but not about the situations when you actually have to make a choice between the two. I was trying to point out using varchar for predictably short is a good choice and using text for arbitrarily long is a good choice.
– Michael J. Calkins
Nov 1 '12 at 20:28






If all the columns are short and predictable (ex: MAC address, IMEI, etc... are things that never change) then use CHAR columns and you can make your row size fixed, which should speed things up considerably if using MyISAM, possibly also InnoDb although I am not sure about it.
– Matt
Apr 2 '13 at 19:15





@MichaelJ.Calkins Thing that happened in MySQL 5.6. Now you also have fulltext search in InnoDB. See dev.mysql.com/doc/refman/5.6/en/fulltext-search.html
– PhoneixS
Jun 5 '15 at 8:17





Character limits: TINYTEXT: 255; TEXT: 65,535; MEDIUMTEXT: 16,777,215; LONGTEXT: 4,294,967,29.
– Victor Stoddard
Feb 22 '17 at 1:46



Just to clarify the best practice:



Text format messages should almost always be stored as TEXT (they end up being arbitrarily long)



String attributes should be stored as VARCHAR (the destination user name, the subject, etc...).



I understand that you've got a front end limit, which is great until it isn't. *grin* The trick is to think of the DB as separate from the applications that connect to it. Just because one application puts a limit on the data, doesn't mean that the data is intrinsically limited.



What is it about the messages themselves that forces them to never be more then 3000 characters? If it's just an arbitrary application constraint (say, for a text box or something), use a TEXT field at the data layer.


TEXT





What does "which is great until it isn't" mean? What does "isn't" refer to?
– Pacerier
Jul 16 '15 at 8:13






@Pacerier To give you an example of the "isn't" James is likely on about: Take for example Twitter, who until very recently had a 140 character limit on PMs. They decided it was no longer sensible and chose to remove that limit completely. If they'd not thought ahead about that (which I'm pretty sure they probably did...) they would have run in to the scenario outlined above.
– PaulSkinner
Sep 3 '15 at 10:42





I am just putting up our new database, and I'd assumed nobody could possibly put more than 2000 characters into our tiny comment boxes, and then, as James notes, tonight it suddenly "wasn't ok" because a user put through a very valid comment that was 2600 characters long. I'd used varchar(2000) thinking it couldn't possibly get longer than that, and I was wrong. so yes, it's great until it isn't. In our case that took only a few days to manifest. The rule below, Michael J. Calkins, I think I will use from now on. text for messages, comments.
– Lizardx
Feb 11 '16 at 8:17






@Pacerier "which is great until it isn't great". In other words, it works almost all the time and is wonderful...except those exceptional situations where it isn't so great.
– Limited Atonement
Mar 30 '16 at 14:06





@Pacerier another interesting example is mentioned in the comments of the selected answer, basically he had a front-end limit of 2,000 characters but the characters introduced were in a codepage that in reality used more bytes than normal letters, his database ended up needing space for 24k characters just because he had to account for the actual byte size of the characters being introduced.
– RaptorX
Jul 1 '16 at 16:45



Disclaimer: I'm not a MySQL expert ... but this is my understanding of the issues.



I think TEXT is stored outside the mysql row, while I think VARCHAR is stored as part of the row. There is a maximum row length for mysql rows .. so you can limit how much other data you can store in a row by using the VARCHAR.



Also due to VARCHAR forming part of the row, I suspect that queries looking at that field will be slightly faster than those using a TEXT chunk.





The row length limit is 65,535 bytes [ dev.mysql.com/doc/refman/5.0/en/column-count-limit.html ]. If your column is utf8-encoded, that means a 3000-character varchar column can take up to 9000 bytes.
– Jan Fabry
Jan 7 '10 at 21:05



varchar





UTF-8 characters can be up to 4 bytes, so I think you meant 12,000 bytes (unless there is some MySQL thing I'm not understanding here).
– raylu
Jul 10 '11 at 3:15





@raylu MySQL's UTF-8 is "fake UTF-8" in that it only supports 3 bytes per character max, so there is no way to directly store unicode characters beyond BMP plane in MySQL's UTF-8. This is fixed in MySQL 5.5.
– Pacerier
Jul 6 '12 at 5:28






I believe that this assertion is valid for MyISAM only. I can't find a definitive source but I believe that InnoDB stores TEXT inline in the table as well.
– dotancohen
Dec 2 '13 at 14:39


TEXT





@dotancohen I found a source here explaining that storing of variable length data using InnoDB may vary (can be stored externally or inline within the row) mysqlserverteam.com/externally-stored-fields-in-innodb
– KiX Ortillan
Aug 28 '15 at 0:43




Short answer: No practical, performance, or storage, difference.



Long answer:



There is essentially no difference (in MySQL) between VARCHAR(3000) (or any other large limit) and TEXT. The former will truncate at 3000 characters; the latter will truncate at 65535 bytes. (I make a distinction between bytes and characters because a character can take multiple bytes.)


VARCHAR(3000)


TEXT



For smaller limits in VARCHAR, there are some advantages over TEXT.


VARCHAR


TEXT


CHARACTER SET


INDEXes


SELECTs


TEXT


VARCHAR


TINYTEXT


VARCHAR


VARBINARY


VARCHAR


BLOB


TEXT



Rebuttal to other answers



The original question asked one thing (which datatype to use); the accepted answer answered something else (off-record storage). That answer is now out of date.



When this thread was started and answered, there were only two "row formats" in InnoDB. Soon afterwards, two more formats (DYNAMIC and COMPRESSES) were introduced.


DYNAMIC


COMPRESSES



The storage location for TEXT and VARCHAR() is based on size, not on name of datatype. For an updated discussion of on/off-record storage of large text/blob columns, see this .


TEXT


VARCHAR()



The preceding answers don't insist enough on the main problem: even in very simple queries (SELECT t2.* FROM t1, t2 WHERE t2.id = t1.id ORDER BY t1.id) a temporary table can be required, and if a VARCHAR field is involved, it is converted to a CHAR field in the temporary table. So if you have in your table say 500 000 lines with a VARCHAR(65000) field, this column alone will use 6.5*5*10^9 byte. Such temp tables can't be handled in memory and are written to disk. The impact can be expected to be catastrophic.



Source (with metrics): https://nicj.net/mysql-text-vs-varchar-performance/
(This refers to the handling of TEXT vs VARCHAR in "standard"(?) MyISAM storage engine. It may be different in others, e.g., InnoDB.)






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

glNthO
dSQOUPkV qDPPg6tUFOPq2oqyufs gw4zlg,gTezVQJVkgk3s,KmQBt8oeIR74LopOWGSegfIDKp,g jlEd,dwWUq0o41 y3NAPbI0lL91

Popular posts from this blog

PHP contact form sending but not receiving emails

Do graphics cards have individual ID by which single devices can be distinguished?

Create weekly swift ios local notifications