Pat describes roughly what I had in mind in much greater detail. You should not be worried about the row counts, here. This solution scales much better than the one you have purposed, especially for large row counts.
I have 3 million row tables containing more data than this sitting on my completely unoptimized laptop that I query regularly query. When the queries are properly indexed, they return in milliseconds.
I will say that your obsession with scale is a bit misplaced in your first database design project. Don't sacrifice normalization for scale until you really know what you're doing around database design.... probably a minimum of 5 years experience under your belt before you start making those kinds of decisions.
-----Original Message----- From: support-bounces@drupal.org on behalf of Pat Johnston Sent: Fri 3/25/2011 20:00 To: support@drupal.org Subject: Re: [support] Doubt on Index and Primary key
Austin,
I haven't dealt with high row counts with MySQL - I don't know for sure. But the rows are small so you're not using too much space. I would think it should be OK with some performance tuning. I'd suggest indexing the Skillset table by UID and sID.That and maybe some decent hardware configuration should be able to handle it.
Any comments from the MySQL people?
Regards,
Pat
On 3/25/2011 7:24 PM, Austin Einter wrote:
Hi Pat Thanks for detail help. I really appreciate it. I did put the tables in excel sheet. It looks fine. But the only worry is skillset table row count. Assuming on average, one person has 10 different skills, then skillset table will have 10 entries per user. Assuming total 50,000,00 users (May be a higher number , but I want to design keeping this number high), then total number of rows in skillset table will be 50,000,00 x 10 = 50,000,000 which is really high. Is this going to be a bottle neck from MySql perspective, search time perspective?? I am attaching the excel sheet (please where I have put the tables as per suggestion), please have a look. Best Regards Austin
On Fri, Mar 25, 2011 at 2:27 PM, Pat Johnston <Pat@melrosecenter.com mailto:Pat@melrosecenter.com> wrote:
Austin, I think you're off to a good start. I might suggest that you do something about all of those skill-set tables. You could replace them with three tables and still have the information in easy reach: - A skillGroup table with -- sgID - an integer primary key -- name - name of the group, like 'Web Skills', 'Database' - A skills table with -- sID - an integer primary key -- sgID - ID of the skillGroup that the skill belongs to -- skill - the name of the skill, like 'PHP' or 'PostgreSql' - A skillset table with -- UID - your user ID -- sID - ID of the skill being rated -- skillRating - 0 or 1 or whatever The skillset entry links to the skill via the sID and to the user table (Table 1) with the UID. The skill table is then linked to the skillGroup via the sgID. This way you can add skill groups and skills as need without needing to create more tables. Good Luck, Pat On 3/24/2011 6:58 PM, Austin Einter wrote:Hi Pierre, David, Ursula Thanks for excellent piece of information. I just went through basic database concepts like indexing, join, normalisation and tried to analyse how can I apply these for my job registration site implementation. After understanding a bit on normalisation and join, I have comeup with below approach for this specific case. Instead of having a single table, and comma separated values in table cells , I am going to split it multiple tables. I am attaching a table.xls file , please have a look. In that excel sheet, I have the main table, and I have broken the main table into 9 different tables. But I hope it need to be broken into more number of tables, depends on how many different kind of work domains are there. It may go to 50+ tables. Examples for work domains are - Web, PSTN, VoIP, NetworkManagement, GSM, Datbase, BoardDesign etc. So how many work domains are there, those many tables will be there. In those tables, a coulmn will represent a particular skill set. Say under Web Tables, coulmns can be HTML, PHP, Web2.0, Drupal, ASP, etc And under NetworkManagement table, NMS and SNMP can be coulmns. In table cells, I will keep either 1 or 0, depending on the person has that skill or not. Example: Lets say User 2 knows NMS and SNMP, User 3 knows only SNMP, User4 knows only SNMP, then the table will look as below. *UID* NMS SNMP 2 1 1 3 0 1 4 0 1 As first time I am doing this, I might be wrong. If so, kindly let me know. Best Regards Austin. On Fri, Mar 18, 2011 at 8:54 PM, Pierre Rineau <pierre.rineau@makina-corpus.com <mailto:pierre.rineau@makina-corpus.com>> wrote: Le vendredi 18 mars 2011 à 08:15 -0700, Metzler, David a écrit : > Pierre is spot on here. > > That is why most dbas would advise against storing this data in a comma separated list in a single field. An index cannot really be used to search within the text cause you are forcing to examine every row anyway. I can't programtically say lets start with the N's, now is there a nokia in there (that's an oversimplification intentionally to make a point). Rather I would make a single skill table that housed the values. If UID is the primary key for the resume, then you'd make a table with > > On a separate note, you do understand that the site that you're talking about building could be done without you writing ANY code? Basically the site you've described can be implemented with content_profile, cck and views modules, allowing you to build custom content types that are tied (one per user). You could then use taxonomys for skill sets an all this would be written for you? > > Dave Dave is right about the fact this simple business stuff could be done in many ways using D6 existing modules (even only with core and taxonomy) or D7 fields. But, if you really want to learn technical aspects of SQL and/or Drupal development, this is a good thing to start with this kind of simple business stuff. Pierre. -- [ Drupal support list | http://lists.drupal.org/ ]-- [ Drupal support list | http://lists.drupal.org/ ]